Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix site-packages identification. #2262

Merged
merged 3 commits into from
Oct 9, 2023

Conversation

jsirois
Copy link
Member

@jsirois jsirois commented Oct 9, 2023

Fix site-packages identification.

In the process of working a new feature, it was discovered that
sys.path manipulations leaked into PythonInterpreter when they were
performed before the __pex__ import hook. This is fixed by always
going through the interpreter disk cache path, which has properly
isolated sys.path for quite some time, eliminating the one exception
for the current interpreter. Fixing this case caused several tests to
fail that exposed several eager uses of PythonInterpreter.get() that
would execute in import of various modules. Those cases are fixed to
either be lazy or use sys.executable.

Along the way, fix up the ~TODO for migrating from distutils.sysconfig
to sysconfig for determining the site-packages directories. This
should preempt issues with vendor (e.g.: Debian) supplied Python 3.12+.

In the process of working a new feature, it was discovered that
`sys.path` manipulations leaked into `PythonInterpreter` when they were
performed before the `__pex__` import hook. This is fixed by always
going through the interpreter disk cache path, which has properly
isolated `sys.path` for quite some time, eliminating the one exception
for the current interpreter. Fixing this case caused several tests to
fail that exposed several eager uses of `PythonInterpreter.get()` that
would execute in import of various modules. Those cases are fixed to
either be lazy or use `sys.executable`.

Along the way, fix up the ~TODO for migrating from `distutils.sysconfig`
to `sysconfig` for determining the site-packages directories. This
should preempt issues with vendor (e.g.: Debian) supplied Python 3.12+.
@jsirois jsirois requested review from kaos, benjyw and huonw October 9, 2023 00:00
@jsirois
Copy link
Member Author

jsirois commented Oct 9, 2023

This all came up in work towards #2097 and #1361.

@@ -979,9 +998,6 @@ def _spawn_from_binary(cls, binary):
cached_interpreter = cls._PYTHON_INTERPRETER_BY_NORMALIZED_PATH.get(canonicalized_binary)
if cached_interpreter is not None:
return SpawnedJob.completed(cached_interpreter)
if canonicalized_binary == cls.canonicalize_path(sys.executable):
current_interpreter = cls(PythonIdentity.get())
Copy link
Member Author

@jsirois jsirois Oct 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the bug. Here PythonIdentity.get() slurps up the ambient sys.path which may have been mucked with. Totally classic premature optimization. Don't optimize, instead just don't pessimize. Do the simplest thing, you're not smart enough to do more.

@jsirois
Copy link
Member Author

jsirois commented Oct 9, 2023

Ok, and the CI failure was real. The vendoring process is very titchy, very bootstrappy. The PythonInterpreter.get semantics going from in-mem to through the file cache system needs the vendored code to pre-exist, which triggered this new bootstrap tangle.

Copy link
Collaborator

@huonw huonw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand the moving parts in detail, but the code certainly seems to match the PR description, AFAIK. 👍

pex/pex.py Outdated
for path in paths:
if path in seen:
continue
seen.add(path)
Copy link
Collaborator

@huonw huonw Oct 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this set just an optimisation, or is trying to be a guarantee that all output paths are unique?

If it's meant to be a guarantee, as written, it seems possible that this could be violated, e.g.:

  • the abspath codepath: paths == ["a", "/path/to/cwd/a"] while running with /path/to/cwd and thus get a sequence like ["a", "/path/to/cwd/a", "/path/to/cwd/a"] (where the first two correspond to the first input element, and the last corresponds to the second input element)
  • the realpath codepath: paths == ["/a", "/link-to-a"] where os.path.realpath("/link-to-a") == "/a"
  • the combination of the two: AFAIK, os.path.realpath will absolutise a relative path, and thus not isabs(path) implies os.path.realpath(path) != path and both extra yields will happen

Maybe this private function is not ever actually called with problematic args, though?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The questions the code raised are enough here to justify simplifying at the expense of slightly more work. Fixed.

@@ -61,16 +61,12 @@ def register(
),
)

current_interpreter = PythonInterpreter.get()
program = sys.argv[0]
singe_interpreter_info_cmd = (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix the typo to single_ while you're here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@jsirois jsirois merged commit 1f20e4a into pex-tool:main Oct 9, 2023
24 checks passed
@jsirois jsirois deleted the 3.12/fix-site-packages branch October 9, 2023 17:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants